11 research outputs found
14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon
Chemistry and materials science are complex. Recently, there have been great
successes in addressing this complexity using data-driven or computational
techniques. Yet, the necessity of input structured in very specific forms and
the fact that there is an ever-growing number of tools creates usability and
accessibility challenges. Coupled with the reality that much data in these
disciplines is unstructured, the effectiveness of these tools is limited.
Motivated by recent works that indicated that large language models (LLMs)
might help address some of these issues, we organized a hackathon event on the
applications of LLMs in chemistry, materials science, and beyond. This article
chronicles the projects built as part of this hackathon. Participants employed
LLMs for various applications, including predicting properties of molecules and
materials, designing novel interfaces for tools, extracting knowledge from
unstructured data, and developing new educational applications.
The diverse topics and the fact that working prototypes could be generated in
less than two days highlight that LLMs will profoundly impact the future of our
fields. The rich collection of ideas and projects also indicates that the
applications of LLMs are not limited to materials science and chemistry but
offer potential benefits to a wide range of scientific disciplines
A theoretical analysis of single molecule protein sequencing via weak binding spectra.
We propose and theoretically study an approach to massively parallel single molecule peptide sequencing, based on single molecule measurement of the kinetics of probe binding (Havranek, et al., 2013) to the N-termini of immobilized peptides. Unlike previous proposals, this method is robust to both weak and non-specific probe-target affinities, which we demonstrate by applying the method to a range of randomized affinity matrices consisting of relatively low-quality binders. This suggests a novel principle for proteomic measurement whereby highly non-optimized sets of low-affinity binders could be applicable for protein sequencing, thus shifting the burden of amino acid identification from biomolecular design to readout. Measurement of probe occupancy times, or of time-averaged fluorescence, should allow high-accuracy determination of N-terminal amino acid identity for realistic probe sets. The time-averaged fluorescence method scales well to weakly-binding probes with dissociation constants of tens or hundreds of micromolar, and bypasses photobleaching limitations associated with other fluorescence-based approaches to protein sequencing. We argue that this method could lead to an approach with single amino acid resolution and the ability to distinguish many canonical and modified amino acids, even using highly non-optimized probe sets. This readout method should expand the design space for single molecule peptide sequencing by removing constraints on the properties of the fluorescent binding probes
RNA timestamps identify the age of single molecules in RNA sequencing
© 2020, The Author(s), under exclusive licence to Springer Nature America, Inc. Current approaches to single-cell RNA sequencing (RNA-seq) provide only limited information about the dynamics of gene expression. Here we present RNA timestamps, a method for inferring the age of individual RNAs in RNA-seq data by exploiting RNA editing. To introduce timestamps, we tag RNA with a reporter motif consisting of multiple MS2 binding sites that recruit the adenosine deaminase ADAR2 fused to an MS2 capsid protein. ADAR2 binding to tagged RNA causes A-to-I edits to accumulate over time, allowing the age of the RNA to be inferred with hour-scale accuracy. By combining observations of multiple timestamped RNAs driven by the same promoter, we can determine when the promoter was active. We demonstrate that the system can infer the presence and timing of multiple past transcriptional events. Finally, we apply the method to cluster single cells according to the timing of past transcriptional activity. RNA timestamps will allow the incorporation of temporal information into RNA-seq workflows
HyPR-seq: Single-cell quantification of chosen RNAs via hybridization and sequencing of DNA probes
© 2020 National Academy of Sciences. All rights reserved. Single-cell quantification of RNAs is important for understanding cellular heterogeneity and gene regulation, yet current approaches suffer from low sensitivity for individual transcripts, limiting their utility for many applications. Here we present Hybridization of Probes to RNA for sequencing (HyPR-seq), a method to sensitively quantify the expression of hundreds of chosen genes in single cells. HyPR-seq involves hybridizing DNA probes to RNA, distributing cells into nanoliter droplets, amplifying the probes with PCR, and sequencing the amplicons to quantify the expression of chosen genes. HyPR-seq achieves high sensitivity for individual transcripts, detects nonpolyadenylated and low-abundance transcripts, and can profile more than 100,000 single cells. We demonstrate how HyPR-seq can profile the effects of CRISPR perturbations in pooled screens, detect time-resolved changes in gene expression via measurements of gene introns, and detect rare transcripts and quantify cell-type frequencies in tissue using low-abundance marker genes. By directing sequencing power to genes of interest and sensitively quantifying individual transcripts, HyPR-seq reduces costs by up to 100-fold compared to whole-transcriptome single-cell RNA-sequencing, making HyPR-seq a powerful method for targeted RNA profiling in single cells
Recommended from our members
14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon â€
Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines